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Abstract 

a 

^ We consider in this paper a contamined regression model where the 

distribution of the contaminating component is known when the Eu- 
^ cUdean parameters of the regression model, the noise distribution, the 

QQ contamination ratio and the distribution of the design data are un- 

0^ known. Our model is said to be semiparametric in the sense that the 

^ probability density function (pdf) of the noise involved in the regression 

y—i model is not supposed to belong to a parametric density family. When 

the pdf 's of the noise and the contaminating phenomenon are supposed 
to be symmetric about zero, we propose an estimator of the various (Eu- 
^ clidean and functionnal) parameters of the model, and prove under mild 

^ conditions its convergence. We prove in particular that, under technical 

5^ conditions all satisfied in the Gaussian case, the Euclidean part of the 

model is estimated at the rate Oa.sin~^^^'^'^), 7 > 0. We recall that, as 
it is pointed out in Bordes and Vandekerkhove [5] , this result cannot be 
ignored to go further in the asymptotic theory for this class of models. 
Finally the implementation and numerical performances of our method 
are discussed on several toy examples. 

Keywords. M-cstimator, mixture, regression model, empirical process, semipara- 
metric identifiability, uniform convergence rate. 
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1 Introduction 



Let (f/j)i>i be a sequence of independent and identically distributed (iid) ran- 
dom variables according to a Bernoulli distribution with parameter p G (0, 1). 
We consider an iid sample {Zi, . . . , Zn) where for all i = 1, . . . ,n, Zi = [Xi, Yi) 
is a bivariate random variable defined, relative to Ui, as follows 



where the design sequence {Xi)i>i, respectively the errors {6^ )i>i, j = 0, 1, is 
a sequence of iid random variables with cumulative distribution function (cdf) 
H, resp. Fj, and probability density function (pdf), h, resp. fj, j — 0,1. 
We suppose in addition that the design sequence is independent from the 
errors. This model, called the 2-mixture of regression model, belongs to the 
wide class of mixture of regression models which has been studied in [29]; 
see also [26] in a LOS (length of stay) medical problem, [6] for prediction, 
or [27] in a nonparametric modelling context. Recently Martin-Magniette et 
al. [21] introduced this model in microarray analysis for the study of the 
two color ChlP-chip experiment. Briefly, the Chromatin immunoprecipitation 
(Chip) is a well established procedure to investigate proteins associated with 
DNA. Chip on chip involves analysis of DNA recovered from ChIP experiments 
by hybridization to miccroarray. In a two color ChlP-chip experiment, two 
samples are compared: DNA fragments crosslinked to a protein of interest (IP) 
and genomic DNA (input). The goal is then to identify actual binding targets 
of the IP, i.e. probes whose IP signal is significantly larger than the input 
signal. In the model proposed by Martin-Magniette et al. [21] the components 
of the random vector Zi = (X,-, Yi), see model (1), corresponds respectively to 
the log-input and log-IP intensities of probe i when the (unknown) status of 
the probe is characterized through a label Ui which is 1 if the probe is enriched 
and if it is standard (not enriched). Note also that the assumption made by 
these authors on the error sequences {s^^)i>i, j = 0, 1, is that e^^ — Si for all 
(j, i) e {0, 1} X N* where £j is a Gaussian random variable with mean and 
variance cr^ (homoscedaticity with respect to the probe status Ui). 



= ao + haXi e\ 
= ai + h^Xi + 



[1] 



if Ui = 0, 
if Ui = 1, 



(1) 
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In this work, we propose to weaken this last assumption while completely 
specifying the regression model under the probe standard condition (the pa- 
rameter 9^^^ :— (oo, bo) e and /o are supposed to be entirely known). Note 
that this kind of assumption arises naturally in microarray analysis, see model 
(5) and references [1], [12], or [3] p. 744 formula (22), where analytic expres- 
sion of /o, characterizing probe expressivity levels under a certain standard 
condition, is assumed to be available (generally derived from training data and 
probabilistic computations). In particular we will suppose that, in model (1), 
the distribution of the sf^ is seen as a nuisance parameter (it is no longer sup- 
posed to belong to a parametric distribution family), turning model (1) into a 
purely semiparametric model. Note that when 9^^^ is known the observations 
Yi, ior i — 1, . . . ,n, can be centered according to Yi :—Yi — (ao -l- boXi) which 
implies a simplification of model (1), since we then have 



where a := Oi — Oq and (5 := hi — h^. We suppose in model (2), which is from 
now on our model of interest, that the Zi = (Xj,l^)'s distribution admits a 
pdf with respect to the Lebesgue measure on defined by: 



where / denotes the unknown pdf of the si, fo the known pdf of the s\,h 
the unknown pdf of the X^, f and fo being supposed to belong to the class 
of even densities. We will finally denote by i? := {p,a,/3) e (0, 1) x the 
unknown Euclidean parameter of model (3). Model (2) corresponds exactly to 
a contaminated version of the semiparametric additive regression model studied 
in [9], [10] and more recently in [28]. On the other hand model (3) extends for 
the first time to the bivariate case, the class of semiparametric mixture models 
introduced by Hall and Zhou [13] for M^'-valucd observations with s > 3, and 
studied later in the univariate case, through two specific models: 




if Ui = 0, 
if Ui = 1, 



(2) 



h{x)gY\x=x{y) 

h{x)\pf{y - (a + /3x)) + (1 - p)fo{y)], 



{x,y)eR^ (3) 



9{y) = pf{y - f^i) + (1 - p)f{y - 1^2), 



yeR, 



(4) 
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where {p, fJ,i, fJ>2) G (0, 1/2) x and /, supposed to be even, are unknown, see 
[2], [17], [20], and 



g{y)=pf{y) + {^-p)My-i^2), yeR, (5) 

where (p, //) e (0, 1) x R and / are unknown, /o is known, and the pdfs / and 
/o are supposed to be even, see [3], [5]. 

The paper is organized as follows. In Section 2 we present an M-estimating 
method, inspired by [2], [3] and [5], that allows us to estimate the Euchdean 
and the functional parameters of model (2); in Section 3 we address the semi- 
parametric identifiability problem associated to expression (3) and establish 

rates of convergence of our estimators; in Section 4 we discuss the performance 
of our method on simulated examples and focus our attention on the optimiza- 
tion problems encountered during its implementation. When technical results 
are relegated to the appendix, which corresponds to Section 5. 



2 Estimating method 

In the spirit of [2], [3] and [5], we will suppose that / and /o are both pdfs 
symmetric about zero (recall that only /o is assumed known) . To avoid trivial 
situations or trivial non-identifiability problems (see Remark in Section 3.1), 
we will impose p 7^ 1 and 9 :— (a,/?) e $ C RxR*, which implies that the 
Euclidean parameter 1? will be assumed to belong to a parametric compact and 
convex space 

e := [5, 1 - 5] X * C (0, 1) X {R X R*} , (6) 

where 5 G (0, 1) . 

For simplicity, we will endow the spaces R*, s > 1, with the || ■ ||s norm (for 
clarity the dimension s is recalled in index) defined for all f = {vi, . . . ,Vs) by 
W'^Ws = X]j=i I'^jl where | ■ | denotes the absolute value. 
We now introduce the following non-commutative notation: 

eQx:^a + l3x, (6l,x)e$xR. 
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Following the ideas developed by the authors mentioned above, it is possible 
to use the symmetry assumption made on / to identify the true value of the 
Euclidean parameter. The idea consists in noticing that for 9 fixed in the 
sample {Y^, . . . , Y^) obtained by considering the so-called 9 -transformation 

Yf ■.= Yi-9QXi, z = l,...,n, (7) 

is distributed according to 

f{y + {9-9^)Qx)h{x)dx + {l-p,) / fo{y + 9 Q x)h{x)dx, (8) 
Jr Jr 

where = (p*,Q;*,/3*) e © denotes the true value of the parameter. Let us 
observe now that when 9 — 9^ 

{y) = P*f{y) + {i-P*) f fo{y + O.Q x)h{x)dx. (9) 

JR. 

Remark. When 9 is well fitted {9 = 9^) the model associated to the Y^ is very 
close to the simple contamination model (5) studied in [3] or [5] where the 
location // is known but the proportion p is unknown. 

Isolating / in (9) and replacing = (p^,, 9^) hy 'd = (p, 9) one can define a 
new parametric class of functions J^q '■— {f-d '■ € 6}: 

1 1 /* 

U{y) = -^e{y)-— My + 0Qx)h{x)dx, (y, i?) e M x 9(10) 
P P Jr 

that satisfies under -d = -d^, 

f{y) = UM = US-y) = /(-?/), ?/ e R. (n) 

The intuition consists now in claiming that, if we make vary over G and that 
we are able to check that is symmetric about for a certain value of then 
we have reached the true value of the Euclidean parameter. Note that in the 
right hand side of (10), the second integral term is in general unknown but can 
be estimated pointwise by a standard Monte Carlo approach, see expression 
(16), or a nonparametric Monte Carlo approach, see expression (24). The idea 
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to check this situation, and then to estimate = (p, 9), is to consider a contrast 
function based on the comparison between the cdf version of /^(y) 

H,{y;i9) := H,{y;p, Fe, Je) := ^Fe{y) - ^Jeiv), {y,9) G M x 6, 

and the cdf version of f^{—y) 

H^iy; ^) := H^iyip, F,, J,) := 1 - -Fe{-y) + ^Je{-y), {y, 9) eRxe, 

p p 

where for all ^ e 

My) ■= I h{z)dz, y e M, with l0{z) := ( fo{z + 9Qx)h{x)dx, z eR, 



and 



/y 
feiz)dz, y e 
■oo 



Notice that for all 9 fixed in $, Je{-) and F0{-) are the cdfs associated respec- 
tively to the ^^-transformed known component population (the Yi such that 
Ui — in (2)) and the 6'-transformed whole data. Let us define the following 
function 

H{y;i9) := H^{y-^) - H^iy;^), eRxO. (12) 

Notice that under using the symmetry of /, 

H{y;^,)^0, yew. 

To avoid numerical integration in the approximation of an empirical contrast 
function based on the comparison of Hi and H2 over M, we proceed as follows. 
Let Q be an instrumental weight probability distribution with pdf q with 
respect to Lebesgue measure. We suppose that q is strictly positive over R 
and easy to simulate. Then we consider 

d{^):= f H\y,^)dQ{y), (13) 

where obviously (i(i?) > for all 1? e O and d{-3^) = 0. Let {Vi, . . . , K) be 
an iid sample from Q. An empirical version dn{-) of d{-) can be obtained by 
considering 

1 " 

dn{^) ■.= -J2H'{Vi;p,Fn,e,Jn,e), ^ e 6, (14) 

i=l 



where 

JnM ■■= r infi{z)dz, {y, ^) e R X (15) 

^— oo 

with 

1 " 

LA^) -=-^2 + ^ ® ^*)' ^) e M X $, (16) 

1=1 

which leads actually to the simple expression for Jn,e{y) 

Jn,e{y) ^o(y + {y,d)emx (i?) 

1=1 

and where F„fi denotes a smooth version of the empirical cdf 

1 

Th * 

i=l 

defined by 

FnAy) ■= f ^nAt)dt, (y,0) e k x (is) 

^— oo 



where 



(19) 



In (19), we assume the standard condition insuring, for each ^ e the Li 
convergence of ^n,e towards defined in (8) (see Devroye [11]), namely 

bn 0, nbn +00, (20) 

and X is a symmetric density function. Finally we propose to estimate -d^ by 
considering the M-estimator 

i^n := iPnJn) = arg min (i?) • (21) 

Once is estimated by a natural way to estimate F and / consistently is 
then to consider the plug-in empirical versions of ifi(-; ■»?) and (10), respectively 
defined for all y e R by 

F^{y) := H,{y;p^,F^^^^,l^J, (22) 

fniy) lKoJy) + '-^U(y)^ (23) 



where, for all ^ e 0, In,e and Jn,d are respectively nonparametric estimators 
of Iff and Jg based on an iid simulated sample (ef, from /q obtained 

by considering 



For convenience, the kernel used to compute (24) will be Gaussian, i.e. K{t) — 
N'o,i{t) where Afm,a^{t) := {27ra^)-^/^ cxp{{t - m)^/2a^), for all teR. In this 
second plug-in step we consider, for the sake of simplicity in our proofs, the 
nonparametric estimates (25) and (24) instead of (15) and (16). This choice 



proof of Theorem 3.1 ii) and iii)), but the same results should be obtained, at 
the price of an aditionnal technical lemma, by considering directly the Monte 
Carlo estimators (15) and (16). 

3 Identifiability and consistency 
3.1 Identifiability 

In this section we recall briefly why model (3) is identifiable under conditions 
similar to those established in [3] and summarized below. Let us define Ts '■— 
{/ e Jjg \x\^f{x)dx < +00} for s > 1, where T denotes the set of even 
pdfs. When (/, /o) £ with s > 2, we denote m := J^x^f{x)dx and mo := 



Definition 3.1 (Identifiability). Let {pi,9i, fi, hi) and {p2,02, f2,h2) denote 
two sets of parameters for model (3). The parameter in model (3) is said to 
he semiparametrically identifiable if 




allows us to use similar nonparametric results for both f^^g and 



In,e (see the 



J^x^fo{x)dx. 



{pi, Oi, fi{y), hi{x)) = (p2, O2, f2{y), h2{x)), 
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for X^^-almost all {x,y) e M.'^, whenever we have 



{pifi{y-OiQx) + {l-p,)Uy))h,{x) 

= {P2f2{y - ^2 x) + (1 - p2)My)) h2{x), (26) 

for X'^^ -almost all {x,y) e M^. 

Lemma 3.1 If the Euclidean parameter space Q is a subset o/R x R* \ {0, 0}, 
supp{f) — supp{fo) — R; supp{h) contains at least two intervals respectively in 
the neighborhood of and +oo (or —oo), and the pdfs involved in model (3) 
satisfy (/o,/) G J-3 x J^^, then the parameter in model (3) is identifiable. 

Proof. Integrating (26) with respect to y over R, we then obtain that hi{-) — 
/i2(-) A-almost everywhere. Let h{x) :— hi{x) for all x e supp(/i) := supp(/ii)n 
supp(/i2)- Notice now that, for all x e supp(/i), (26) coincides with (5) when 
considering the generic location parameter /i equal to 9 Q x. In our case the 
first three conditional moment equations (given {X — x}) associated to (26) 
lead to 

PiOi Q X = P2O2 X, 

(1 - pi)mo + pi((^i xf + mi) = (1 - P2)mo + p2((^2 x)^ + ma), (27) 
Pi(3((^i x)mi + {61 xf) = P2{m02 x)m2 + (^2 xf). 

According to [3], the solutions arc cither, for all x G supp(/i), {pi,Oi Q x) — 
(P2, O2Q x), which implies (pi, ai, /3i) = (p2, "2, /32), or 

( 2(61 Qxf 

~ V3mi + (^i0a;)2-3mo^ 
a ^ „ , 3mi - (^1 xf - 3mo 

92® X = OiQx^ — — , (28) 

2Ui X 

(mi + (^1 xf - mo)(3mi + (^i xf - 3mo) 

m2 = mi H A/n ^ \9 ■ 

4:{9i xf 

Suppose that /3i 7^ and take the limit as x ^ +00 in the first row of (28). 
We then necessarily obtain that p2 = 2pi which is only compatible, when we 
take the limit as x — > 0, with mi = mo. Hence if mi 7^ mg model (3) is 
always identifiable. If we suppose mi = mo, the second row of (28) leads to 
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^2 a: = (6'i a;)/2. If we introduce this last relation in the third row of (28) 
we obtain 

777.2 — 777i = -{9i , X G R, 

which is impossible when x — > +00 and thus provides us the global identifia- 



Remark. In Lemma 3.1 we have considered for simplicity the case where 
the slope parameter p is supposed to be different away from zero. Actually 
this condition can be technically relaxed if we allow 9 to be equal to (a, 0) 
with a ^ 0. In fact, considering the first row of (28) and taking the limit as 
X — )■ +00, we obtain P2 = Pi = 0. To conclude, it is then enough to integrate 
(26) with respect to x over M which leads to discuss the same condition as 
in [3], p. 735 expression (3). Then Proposition 2 in [3] provides an almost 
everywhere-tjpc identifiability result which unfortunately cannot be strictly 
compared to the result stated in Lemma 3.1. For this reason we decided to 
reject 9 = {a,0), a e M*, from the sub-parametric space $ , see (6). 

3.2 Assumptions and statistical complexity 

In the following we provide some general conditions that allow us to control 
the statistical complexity of our model and that insure the validity of basic 
asymptotic results. 

Regularity conditions (R). 

i) The pdfs / and /o are strictly positive over M and belong to J^s- 

ii) The pdfs / and /o are twice differentiable over M with ||/*^-'-'||oo < 00 
and 1 1 /o'''* 1 1 00 < 00, where /^^^ and f^^ denote respectively the j-th order 
derivatives of / and /o, for j = 1,2. 

iii) The pdf h satisfies \x\'^h{x)dx < 00. 

iv) For i = or i = 2, 



bility of model (3). 



□ 




xp|Fo(y + 9*Qx) - Fo{y - 9^ Q x)\h{x)dxdy < 00, 
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and for i = 1 or i = 3, and all e M, 

lim y\Fo{y + u)- Fo{y - u)) = 0. 

2/— >±oo 

v) There exist two collections of functions {^•i,j}o<i<j<2 i^'ij} o<i<j<2 
longing to -Li(M^) and such that, for all {x,y) e and all 6* e 6 

\xj'^'\y +{9- 9*) x)\h{x) < iij{x, y), 

and 

\x'f^\y + eQx)\h{x)<el(x,y). 

For all z E C, let z and '^{z) the conjugate and imaginary part of z, respec- 
tively. We will also denote /, /o the Fourier transforms of /, /o, and define 
for all K, — {k,i,k,2) G K^, i^/t(^) e'^^'^^h{K2t), where h denotes the Fourier 
transform of h. 

The following conditions mainly insure the contrast property for the func- 
tion d defined in (13). We point out that these conditions are not equivalent, 
as is the case in [5] p. 25, to those estabhshed to prove the identifiability prop- 
erty in Lemma 3.1. Loosely speaking the reason of this difference is due to the 
^-transformation that reduces the Euclidean parameter estimation problem to 
the analysis of a collection of one-dimensional data, i.e. the with ^ e 
when the proof of Lemma 3.1 uses strongly the bivariate structure of the orig- 
inal data. 

Contrast condition (C). 

i) The three first moments of X satisfy AE{Xf+^E{X)E{X'^)+E{X^) ^ 0. 

ii) The set of parameters 'd = (p, 9) = (p, a, /3) with p ^ that satisfies 

p.'^iue-eMht) = (P* -pMMt))Mt), t e R, (29) 
is empty or does not belong to the parametric space ©. 
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iii) The second order moments of / and /o, respectively denoted m and uiq, 
are supposed to satisfy 



Remark. Point out that condition C ii), which is necessary to prove that d is a 
contrast function over ©, cannot be simphfied without more information on /, 
/o and h. We suggest, in the spirit of conditions CI and C2 in [16], to consider 
the sufficient and more intuitive regularity comparison-type criterion for C ii) 



which is vahd since, according to (29), the term on left hand side of (30) is 
equal to \p — p*\/p* G [\p — p*\, l/S] which is in contradiction with (30). How- 
ever condition (29) can directly be discussed in the Gaussian case as done in 
the appendix. Section 5.1. We prove in particular that there exist sometimes 
spurious solutions satisfying (29) that have to be removed from the parametric 
space so they are not detected by our estimation algorithm as shown in Fig. 



Kernel and Bandwidth conditions (K). 

i) The even kernel density function K is bounded, uniformly continuous, 
square integrable, of bounded variation and has second order moment. 

ii) The bandwidth 6„ satisfies 6„ \ 0, nbn — >■ +oo and y/nb^ — o(l). 

Lemma 3.2 (i) Under conditions (R) the function d is Lipschitz over ©. 

(ii) Under conditions (C) i) and ii) the function d is a contrast function, i.e. 
for all ^ eO, d{^) > and (i(7?) = if and only if{} = {}^. 

(iii) Under condition (C) Hi) we have 



m ^ tuq + 



3(q;, + P,E{X)) 




(30) 



2. 



d{^,) = 2 / H(y,i»,)H^(y,i»,)dQ(y) > 0. 
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(iv) Under conditions (R) and (K), for any > 0, dn converges to d almost 
surely with the rate 

SUp\dr,{^)-dm=Oa.s.{n-'^'^^). 

Remark . There exists a simple consistent method to select, in the Li(]R^) 
sense (recall that our nonparametric consistency results are established for 
this norm), the best estimator in case of multiple minima of dn (which should 
make suspect that condition (C) is violated). Suppose that, for n fixed in N*, 
there exists a finite collection of local minima of d,,,- denoted by = (^pn, On^ 
with 1 < i < 5" < oo. Then we propose to retain a -i^^ (in practice unique) 
satisfying 

i^n = i^L**' , where = arg min 
and where for all 1 < i < 5" , g^ii] is the plug-in posterior estimator of g defined 

by 

^^w=pi:i^w + (i-p|:i)/o, (31) 

where fji\ corresponds to /„ defined in (23), when "d^ = -i^n . Proceeding in 
that way, we clearly support the Euclidean parameter estimate that better fit 
the dataset, this approach being asymptotically consistent as long as the the 
model is identifiable. 

Proof, i) Prom boundedness and the uniform Lipschitz property of H{-, i?), 
along with the integrability and the integrable Lipschitz property of fo{-) 
proved in Sections 5.3, 5.4 and 5.5, there exists a nonnegative constant c such 
that for all (i?, i?') e 

/ H'{y,i»)dQ{y)- f H'{y,^')dQ{y) 

< [ \H{y,i^) + H{y,^')\\H{y,i^)-H{y,i^')\q{y)dy 

< C||^-^'||3, 

which concludes the proof of i) . 



gn - g^{{\ 
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ii) To clarify the similarity between the semiparametric contamination 
model (5) studied in [3] and the contaminated regression model (3), we can 
say that fo{-) plays the role of — //) and that Io{-) plays the role of /o( - — //). 



If = 1?^ then d{'d) — 0. To prove the converse we notice that d{'d) — 
imphes, since ifi (•,■»?) and (•,'»?) are continuous and g > over R, that 
Hi{-;'&) — H2{-;'&) which leads, for almost all y e R, to 

fe{y) - (1 -P)ie{y) = fe{-y) - {i-p)h{-y). (32) 

Using formula (8), we obtain 

P* I f(y + (0- 0*) x)h{x)dx + (p - p*)Ie{y) 

= Po / f{-y + {0- 9,)x)h{x)dx + (p - p*)Ie{-y), y e R. 
Jm. 

Considering the Fourier transform of the previous equality, using Fubini's The- 
orem, and noticing that / and /o are real- valued functions, it follows that 

p,e-^*("--)/(t)/i((/3 - + ip- p,)e-''^Ut)hm 

= p,e**("-"*)/(t)/i((/3 - + (p - p.)e**"/o(t)/i(/3t), t e 1 

Using the notation introduced for the writing of condition (C), the previous 
equation becomes (29). 

Suppose that p — p* and take the first and third order derivative of (29) at 
point t — 0. We then obtain o; — a* -|- (/3 — (5^)E{X) ~ and 

2,m[a -a, + {p- P*)E{X)] + {a - + 3{a - a,f{p - p,)E{X) 
+ 3{a - a,){/3 - /3,fE{X^) + {/3 - P,fE{X^) = 0, 

which naturally leads to 

(/3 - P,){4E{Xf + 3E{X)E{X^) + E{X'')) = 0, (33) 
and thus implies that ^ = if AE(X)^ + 3E(X)E(X^) + E(X^) ^ 0. 



14 



Suppose now that p ^ p* then condition (C) ii) requires that d — d^. 



iii) First we have 

= 2 / H{y,§^)H^{y,§^)q{y)dy, (34) 
Jm. 

according to (9) and the fact that = on R. Let v be a vector in R^. 

We have 

v'^d{^,)v = 2 / (v^H{y, 1?,)) ' q{y)dy > 0. (35) 

It follows that d('j?*) is a positive 3x3 real valued matrix. Let us show that it 
is also definite. If v e R^ is a non-null column vector such that v^d{Q^v — 0, 
then v^H{y, ■»?*) = for almost all y e R. According to (48) in the appendix, 
we have to discuss the proportionality of / and Fq*{-) + Fq*{ — ) — 1. Because 
/o is an even density, we have from Pubini's theorem 

^ ^[Fo(t/ + e,Qx)- Fojy - 9. Q x)]h{x)dx 

J^2[Foiy + 9, Q x) - Fo{y - 9, Q x)]h{x)dxdy ^ ' 

Using integration by parts and assumption (R) iv), the denominator of the 
right hand side of (36) can be expressed as follows 



/ 



[Fo(2/ + 6** x) - Fo(y - x)]h{x)dxdy 

= / {[yiFoiy + 9,Qx)-Foiy-9,Qx)]°^Jhix)dx 

- y{fo{y + 9^ Q x) - fo{y - 9^ Q x))dyh{x)dx 
Jr Jr 

= 2 / (a* + /3^x)h{x)dx = 2{a^ + /3^E{X)). 
Jr 

If we calculate now the second-order moment of / we obtain 

J^x'^[Fo(y + 9^Qx)- Fo(y -9^Q x)]h(x)dx 



m :^ y f{y)dy 



2{a, + P,E{X)) 
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Using integration by parts and assumption (R) iv), the numerator of the right 
hand-side of (36) can be expressed as follows 

y^[Fo{y + e^Qx)- Fo{y -O^Q x)]h{x)dxdy 

\{FQ{y + e,Qx)-Fo{y-e,Qx) \h{x)dx 

y^ 

y (/o(l/ + 9^Qx)- fo{y - 9^ Q x))dyh{x)dx 



= 2 / jo{u)duh[x)dx 

Jr2 3 

2 



which leads to a contradiction if (C) iii) is assumed. 



iv) This proof, which is a tricky generalization of the proof of Lemma 3.2 
iii) given in [5], is relegated to the appendix for convenience, see Section 5.6.0 

Theorem 3.1 i) If assumptions (R), (C) and (K) are satisfied then 

||4-^*||3 = o„.,.(n-^/'+^), 7>0. 

a) The estimator fn of f defined in (23) converges almost surely in the Li 
sense if n~^/'^'^'^ /hn — >■ 0, for all 7 > 0. 

iii) For any 7 > 0, the estimator F„ of F defined in (22) converges uniformly 
at the following almost sure rate 

\\Fn - i^lloo = 0„.,.(n-i/^+V6„) + OaA^l), 7 > 0. (37) 

The above rate is optimized by considering bn — n"^^^^, which choice pro- 
vides the rate of convergence Oa.sX''^''^^^^"')} fof dll 7 > 0. 

Comment. Points ii) and iii) reveal the intuitive idea that the bandwidth 6„ 
must not decrease too fast in order to allow the appropriate positionning of 
the plug-in-centered data in the expression of . In fact the Yf" need to be 
sufficiently close to the Yf* , and bn not too small (the smaller 6„ is the more 
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we "freeze" the kernel estimator) , if we want a good agreement between 

and ^n,6(, which are known to converge to the true ^51^ involved in expression 

(9). 



Proof, i) The proof follows entirely the proof of Theorem 3.1 in [5] and uses 
the technical results proved in Lemma 3.2. 
ii) Consider the following decomposition: 



l/n-/| = 



+ 



1 -Pn j 
Pn " 



+ 



Pn 



Pn 



1 1 

Pn P* 



Pn 



l-Pn 1 - 



Pr, 



P* 



(38) 



It is now enough to study the behavior of | ^„ — ^n,6». | and {1^0^ ~ In,e^ \ ■ For 
alH e R, we have 



k=l 



t _ Y \ ft - V 
K I \-k' ^ 



(39) 



Consider K a centered normalized gaussian kernel. We propose to study in 
a generic way the difference of kernels involved in the right hand side of the 
above expression. For all {w, z) e R^, and letting h :— {z — w)/h, we write the 
second-order Taylor expansion with integral remaining term: 



b J \ b 
where :— {1 — u)z + uw. Noticing that 
't- z 



K 



{t-z)N^p{t), and /sT' 
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it thus follows that 



K 



t — w 



K 



t - z 



dt 



<h I \t-z\^j^dt+- / (1+ ^ )M„,^,b<t)dt 



t 



< 



h + h^ 



(40) 



Replacing w, z respectively by the Yf" and Yf*, and b by 6„ in (40) we then 
obtain from (39) the following bound for the Li error: 



\'^n,L(t)-KoM\\L, < 



C\\i 



'* 2 



n 



(41) 



fe=i 



the same kind of bound being available for 1 1 (i) — /n,e« (0 1 1 Li ■ In conclusion, 
according to the decomposition (38), point i) of Theorem 3.1, the respective 
Li a.s. convergence of ^n,6», and Infi, towards and under (20), we get 
from (41) and the strong law of large numbers that — — >■ almost 
surely as n — )■ oo whenever rT^I'^'^^ jhn = o(l). 



iii) The proof uses an integrated version of decomposition (38) and the fact 
that, for all y € M, the approximation \F^g^ — -^n,6», |(|/) is controlled by 



y i^^it-Y^ 
1^^ 



< 



n 



E 



K 



t-Yf 



K 



K 



t-Yf 



t-Yf* 



dt 



dt 



< ^"V-"^ x^E(w + w'). 

'^^ k=i 



(42) 



the last term in the right hand side of above inequality being independent from 
y. The same bound holds for — InfiJiv) by an identical argument. To 
conclude, it is enough to use (42) and Corollary 1 p. 766 in [22] which allows 
us to control the terms \\Fn,0^ — FgJ|oo and ||/n,6»* — -^6>*||oo, to obtain (37). The 
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rate on right hand side of (37) is optimized by considering hn — n ^/^^ which 
then turns into Oa.s.(n~^/^"'"'''), for all 7 > 0. 

□ 

4 Numerical experiments 
4.1 Role of the ^-transformation 

We propose in this section to highlight the role played by the ^-transformation, 
see (7), in our method. For this purpose, we consider an example which 
corresponds to model (5) taking = 0.7, a* — 2^ — 1, ef^ ~ A/'(0, 1), 
j = 0, 1 and Xi ~ j\/'(2,3). In Fig. 1 we plot successively a simulated data 
set {Xi,Yi)i<i<n, corresponding to the previous description with n — 200, and 
the two ^-transformed datasets obtained with 9 — (1, 0.5) and 9 — 9* — (2, 1). 
These figures are completed by adding their corresponding 2nd-coordinate sam- 






















llf 












_ mi 



Figure 1: First row: resp. plot of an original data (^i,li)i<i<n according to 
model (3) with n — 200, plot of a wrong ^-tranformation {9 ^ 9*), plot of 
the true ^*-tranformation. Second row: resp. histograms of the corresponding 
first row 2nd-coordinate sample data. 

pie data histograms. Note that these histograms are empirical estimates of the 
densities fo, by formula (8), with 9 respectively equal to (0,0), (1,0.5) and 
(2, 1). We see clearly through these three situations how a progressive trans- 
formation of the data allows one to reach a tractable situation in the sense that 
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it looks strongly like the semiparamctric contamination model (5) studied in 
[3] and [5] where a known density is mixed with a symmetric unknown density, 
which corresponds to the behavior observed in the second row, third column 
histogram in Fig. 1. Loosely speaking the second idea of our method consists 
in arguing that once 6 is close to ^* we are allowed to estimate the proportion 
p according to a [5] type-method which corresponds to the minimization step 
(21). In contrast to this technically satisfying idea, the ^-transformation and 
the choice of the weight distribution Q introduced in (13) are two sources of 
serious difficulties. In fact when is large and the law of the design data 
has heavy tails with respect to the tails of /, then the 6^ transformation will 
move the points coming from the Fo-population and located far from the ori- 
gin, to extremely distant positions, which implies intuitively that the integral 
type density involved in (7) should be extremely heavily tailed. Thus in or- 
der to capture the information contained in the tails of the ^-transformed 
data set it is important to weight sufficiently the empirical index of symmetry 
H'^{x;p,Fnfi,Jn,e) of expression (14) for large values of x, which reduces to 
choosing an instrumental distribution Q with non-negligible tails with respect 

to Fg^. 

4.2 Otimization procedure and simulation study 

The aim of this section is to illustrate graphically, on a two-dimensionnal ex- 
amples, the behavior of the empirical distance d„(p, 0, /3) (the parameter a is 
assumed to be equal to zero) when p and (3 lie close to the true value of the 
parameter. For simplicity the parameter will still be denoted '& := {p, (3), with 
9 := (3 and dn{'&) '■= dn{p,0,f3). The interest of this study is to understand 
closely the influence of the mixing proportion p and the regression coefficient (3 
on the shape of the contrast function d (flatness, sharpness, smoothness, etc.). 
Our models are denoted Ml and M2 and defined according to (2) as follows 

Ml: = 0.7, p, = l,V r^Q= Af{0, 4^), - Af{0, 1), X - Af{Q, 3^), 

M2: p, = 0.3, p, = l,V^Q^ Af{0, 2^), £^1 ^ 7V'(0, 1), X - Af{0, 3^), 
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where j = 0, 1. 



In Fig. 2 we plot the mapping {p, (3) H- dn{p, (3) obtained from an Ml- 
sample, resp. M2-sample, of size n = 100, where (p, /3) G ©i = [0.5, 0.8] x 
[0.9,1.1], resp. {p, jS) G 63 = [0.1,0.6] x [0.6,1.4]. Notice that according to 
discussion (CG) at the end of Section 5.1, model M2 is not necessarily con- 
sistently estimated if the parameter space 62 contains the spurious solution 
^9** = {2p^:, f3^:/2), which is voluntary the case here. In practice. Fig. 2 is 




Figure 2: Plot of {p, l3) ^ d„(p,0,/3) with n = 100, /3, = 1, e^^^ ~ A/'(0, 1), 
j = 0, 1, Xi ~ A/'(2, 3^), with the difference that on the left hand side p^ = 0.7, 
Vi ~ Af{0, 42), when on the right hand side = 0.3 Vi ~ Af{0, 2^). 

obtained using the Scilab contour2d function which plots the level curves 
of dn evaluated on a homogeneous 10 x 10 grid of the rectangular domain 
[0.5, 0.8] X [0.9, 1.1]. Fig. 2 shows that the graph of dn looks like a sharp valley 
with a flat trough when /3 is located near /3* and p ranges [0.5,0.8]. Even if 
on this simulated example the argmin of dn is very close to the true value of 
the parameter, the previous remark suggests that the estimation of the mixing 
proportion will be less robust than the estimation of the regression coefficient. 
The observation of the second plot in Fig. 2 is more unexpected since the 
graph of dn does not really look like a contrast function with its high near 
p = 0.1 and its very large and flat trough that covers most of 62 suggesting a 
strong lack of robustness of our estimating method in that kind of situation. 
To validate these thoughts we propose to apply a large sample study on the 
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example and a third intermediary one obtained by considering = 0.3 and 
V ~ jV(0, 4^). The results of this study will be summarized in Table 1. First 
we present the numerical approach used to approximate our M-estimator (21). 



Gradient algorithm and tuning parameters. The gradient optimization proce- 
dure (programmed with Scilab) used to compute our M-estimator i?„ = (p„, 
is defined as follows: 

(i) Initialization: "di — "d*, = "d* + S; 

(ii) while ||^?2 - Ah > e do -i?! = -(^s and i?2 = i^i - l'^dn{A); 

(iii) else i?„ = 'i?2, 

where 5 G is used to create a small perturbation of the initial value, e > 
defines the wanted stabilization level in the stopping algorithm procedure, and 
7 G M+*^ is a scale parameter that needs to be hand-tuned for good efficiency in 
practice (to avoid reverberation phenomena when the score function becomes 
abruptly sharp). The score function dn (^^dn, j^dn^ can be expressed 
into a closed form, i.e. 



dp 

— rf„(i?) = 2 / K,piy,mniy,^)dQn{y), 

where for all y G R, 

d X 

huAv^ ^) dp^^^y^ ^) = {PnAv) + PnA-y) - [JnAv) + JnA-y)]) 
huAy,^) ■■= g^Hn{y,^) = -(^^nAy) + ^nA-y))-^UnAy)+JnA- 
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and where, from (18) and (19) 

d ~ 

*n,/3(y) := g^KAv) 



- -y 



1=1 



and similarly, from (17) 



d - 



1 " 



n 

i=l 



The kernel K used to compute (19), is a triangular kernel defined by 
K{x) = (1 - |x|)I_i<^<i, X e R, 



and the bandwidth 6„ = a/1 + 4p( 1 — p) (4/ (3n) ) -"-/^ (proposed by [7] for gaus- 
sian distributions and implemented in R), both obviously satisfying condition 
(K). The results summarized in Table 1 were obtained with the following hand- 
tuned parameters: 5 = 0.01, e = 0.005, 7 = (0.2,0.5)"^, and an example of 
stabilization for this set of tuning parameters is illustrated in Fig 2, where the 
successive positions (until stabilization) of our algorithm are depicted by cross 
symbols. 

Comments on Table 1. First of all, it is interesting to compare the performances 
summarized in rows 1-3 of Table 1 to those obtained in [5], p. 35, Table 1 where 
the model of interest is (5), with p — 0.7, = 3, and /o and / are respectively 
the pdfs corresponding to the A/'(0, 1) and A/'(0, (1/2)^) distributions. Even if 
these two models are not strictly comparable we think that it is interesting, 
in order to highlight the drawbacks induced by the ^-transformation and the 
choice of Q discussed above, to compare pairwise the performance obtained on 



23 



Table 1: Mean and Std. Dev. of 100 estimates of p, 



n 


(p*,/3*,(7y) 


Empirical means 


Standard deviation 


100 


(0.7,1,4) 


(0.7055,1.0051) 


(0.0373,0.0697) 


200 


(0.7,1,4) 


(0.6976,0.9965) 


(0.0307,0.0590) 


500 


(0.7,1,4) 


(0.6954,1.0059) 


(0.0296,0.0358) 


100 


(0.3,1,4) 


(0.3100,0.9581) 


(0.0577,0.1252) 


200 


(0.3,1,4) 


(0.2965,0.9851) 


(0.0501,0.0855 ) 


500 


(0.3,1,4) 


(0.2975,1.0178) 


(0.0284,0.0414) 


100 


(0.3,1,2) 


(0.3971, 0.8587) 


(0.0942, 0.2213) 


200 


(0.3,1,2) 


(0.3982,0.9149) 


(0.0835,0.1900) 


500 


(0.3,1,2) 


(0.3315, 0.9683) 


(0.0524, 0.1067) 



the mixing proportion p and the parameters that influence the location of the 
F-population, i.e. (5 and /x. From the numerical point of view, we easily check 
that the bias of our estimators, for both models, is negligible. However it also 
appears that the standard deviation associated to (]3„, /3„) decreases signifi- 
cantly slower than the standard deviation associated to (p„, fin) when n grows. 
The performance summarized in rows 4-6 of Table 1, which corresponds to 
p — 0.3 (and hence signifies that the population that will move far from its 
original position due to the ^-transformation will be more important), is very 
instructive. We observe that for small n {n — 100, 200) the standard deviations 
associated to {pn,$n) are dramatically large compared to those obtained with 
p — 0.7. Let std(n,p*, (7y) the couple of standard deviations calculated in 
the last column of Table 1 under (n,p*, uy). If we compute componentwise 
the ratios std(n, 0.3, 1, 4)/std(n, 0.7, 1, 4) respectively for n = 100, 200, 500 we 
obtain approximately (1.54, 1.8), (1.67, 1.44), and (0.95, 1.17) which seems to 
suggest that when n becomes large the side effect of the ^-transformation van- 
ishes (probably thanks to the size of n, which increases globally the precision 
of the empirical estimates, and the tails of Q, that allow the algorithm to take 
these improvements into account efficiently). The performances summarized 
in rows 7-9 of Table 1, seems to confirm the concerns expressed about model 
M2. We recall that model M2 is badly affected by the two following draw- 
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backs : smallness of (synonymous with important population shifted far by 
the ^-transformation and existence of a spurious solution) and a smallness of 
Gy which is then clearly not sufficient to counteract the smallness of (and 
its consequences). We think in particular that, in model M2, the empirical 
contrast (i„ is more easily closer to under = [5^/2 since as explained in 
Section 4.1., this value is then significantly smaller than This last remark 
explains why, in spite of the fact that our algorithms were initialized at the 
true parameter value, our estimates are strongly biased (attracted quite often 
by the spurious solution -d^^^). 

Robustness with respect to the symmetry assumption. We propose to conclude 
this simulation study by testing our method in situations where the law of the 
error e^^l is no longer symmetric. For this purpose we consider again model 
Ml and replace the distribution of e^^' by the mixture 

W(-0.7, 1/^2) + (1 - A)Ar(0.7A/(l - A), 1/^2) 

which pdf, denoted /a, is nonsymmetric if A 7^ 0.5 but has a mean equal to 
and a variance equal to 0.5 for all A G (0, 1). In our simulations we consider 
successively A = 0.5, 0.55, 0.6 which leads to consider pdfs for e^^l which graphs 
are plotted in Fig 3. Some performances of our method on these examples are 




Figure 3: Graphs of the pdfs corresponding to the mixture distribution 
A7\A(-0.7, 1/^2) + (1 - A)A/'(0.7A/(1 - A), 1/^2), obtained by considering 
A = 0.5,0.55,0.6. 

summarized in Table 2. 

Comments on Table 2. Note that when A = 0.5 (symmetric case) the perfor- 
mances of our method are very close to those obtain on model Ml. However 
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Table 2: Mean and Std. Dev. of 100 estimates of p, 



n 


A 


Empirical means 


Standard deviation 


100 


0.5 


(0.7035,1.0229) 


(0.0427,0.0814) 


200 


0.5 


(0.7012,1.0068) 


(0.0390,0.0774) 


500 


0.5 


(0.6997,1.0059) 


(0.0244,0.0488) 


100 


0.55 


(0.6854,1.0837) 


(0.0485,0.0858) 


200 


0.55 


(0.6890,1.0805) 


(0.0431,0.0716) 


500 


0.55 


(0.6922,1.0699) 


(0.0377,0.0519) 


100 


0.6 


(0.6731,1.1314) 


(0.0543,0.0952) 


200 


0.6 


(0.6693,1.1061) 


(0.0490,0.0868) 


500 


0.6 


(0.6775,1.0928) 


(0.0392,0.0557) 



for n = 100, 200 the standard deviation of our estimates is larger than those 
obtained in the Ml model, when for n = 500 the standard deviation becomes 
slightly smaller. This behavior can probably be explained by the fact that the 
graph of /o.5 is fiat on its top which intuitively do not help much in locating the 
axis of symmetry for small values of n. On the other hand we can expect that 
for n = 500, helped with the fact that var(£t^]) is here equal to 0.5 when it was 
equal to 1 in Ml, our nonparametric estimators perform better than in model 
Ml which should explain the good performances observe in the third row of 
Table 2. When A = 0.55, 0.6 it appears that the parameter /3 is always overes- 
timated. This phenomenon can be explained by the fact that our method try 
to determine a pseudo-axis of symmetry adapted to the shapeless graph of fx 
which qualitatively is placed on the left side of the origin. This remark implies 
that the ^-transformation needed to transform the first integral in (8) into an 
almost even density (see Fig. 2) have to contain a /3 greater than 

5 Appendix 

5.1 Conditions (R) and (C) in the Gaussian Case 

In this section we discuss conditions (R) and (C) when the true underlying 
model is a contaminated Gaussian regression model with Gaussian design, i.e., 
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/, /o, h are respectively the pdfs of the jV(0, m), jV(0, mo), and N{E{X), a^) 
distributions. 

Comments on condition (R). Conditions (R) i-iii) are standard and easy to 
verify in the above model. On the other hand, it is interesting to show how 
conditions (R) iv-v) arise naturally in this case. 

Condition (R) iv). We show for simplicity that the first condition in (R) iv) 
(the same kind of proof works also for the second one) holds when i = 0, 
9* — (a*, e R+*^ and rriQ — l. We write the decomposition 

Woiy + e^Qx)- Fo{y - © x)| = 



\Fo{y + 9^Qx) 


- Fo{y - 


9*Qx) 


\^y>l-e*ex, xK-f^ 


+ \Fo{y + d^Qx) 


- Foiy - 


9^Qx) 


iy>l+e,Qx, x>-'^ 


+ \Fo{y + e,Qx) 


- Foiv 


9, x) 


\<-l+0,Qx, x<-^ 


+ \Fo{y + e^Qx) 


- Fo{y - 


9^Qx) 


iy<~i-e»ex, x>-f^ 


+ \Fo{y + e^Qx) 


- Fo(y - 


9*Qx) 


I-l+6»«Oa;<2/<l-e*0a;, x< 


+ \Fo(y + e^Qx) 


- Fo{y - 


9^Qx) 


I-l-6»,0a;<2/<H-0*0x, x> 



Consider the first term on the right hand side of the above decomposition (the 
three following terms being treated in entirely same way). For all y > l — 9*Qx 
with X < —a^//3^ we have y — 9^Qx>y + 9^Qx> 1. Since for i > large 
enough, the inequality (43) is valid 



we have in particular that for alH > 1, < 1 — -Fo(^) ^ exp{—t'^) / ^/2^T . Hence 
it follows that for y > 1 — ^* x with x < — a*//?*: 



which proves that this first term is h{x)dxdy integrable. Let us now sum the 




(43) 



Fo{y + 9^(Dx)- Fo{y -9^Qx)\< 



exp{-{y + 9^Q xy) + exp{-{y - 9^ Q xf) 
V2^ 
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last two terms of the above decomposition and notice that 



\Fo{y + e^Qx)- Fo{y - 6, Q x)\ 

X (^^-i+e*ex<y<i-e»&x, x<-^ + i-i-0*&x<y<i+e*ex, x>-f^^ 

We thus prove that this sum of terms is also h{x)dxdy integrable. 

Condition (R) v). We consider for simplicity the construction of the bounding 
function when (a,/?) G $ = [a, a] x with (a, /3) G and m — 

mo — 1. Notice first that for all {x,y) e 



^ V 2 

Secondly it is easy to check that for all {x,y) G and all 6' G 
{y + a + Pxf 

exp ■ 



^ , {y + M' \, 

< exp ( Y ) ^x>0,y>Q 

T[iin{\y-\-a-\- j3x\,\y-\-a-\- j3x\f 
+ exp I =^ ) lx>o,y<a 

min(|y + a + |y + a + 
+ exp I = ) la;<o,j/eK 

< B^{x,y), 



where 



^ , , , {y + ^xy\ { (y + ^ + pxY 

B^{x,y) := exp ( ^= 1 + exp I 

{y + a + ^xf\ ( {y + a + ^xf 

+ exp ( ^ I + exp 

{y + oi + I3xf 

+ exp ■ 



Thus we can propose = + a + ^|x|)/-v^7r-B$(x, y) exp(— 

which clearly belongs to Li(R^), as a candidate for the uniformly bounding 
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function satisfying condition (R) v). 



Comments on condition (C). In the whole Gaussian case, expression of (29) 
becomes: 

p. sin((a - a.) + (/5 - l3,)E{X))t) exp (^-^{(jIW - P*f + m) 

= (p, - p) sin((a + exp (-^(^x^/?' + ^o)) . (44) 

We suppose first that p ^ p^, and denote ^ := a + [5E{X), :— a* + (5^E{X), 
^I3-I3, ■= o"^(/3 — /3*)^ + m, and := + mo. Taking the first and third 
order derivative of (44) at point t = we get the conditions 

p,^, = p^, and (p - p,)^(3E^_^, + e) + P*{^ - ^*)(3E^ + (^ - C*)') = 0.(45) 

Introducing the first relation in (45) into the second one, we obtain 

(3[E, - E,_,J + ^-^^^e) - 0. (46) 

Now we observe that, to insure the validity of expression (44), the factors 
multiplied by the sin terms on both sides of (44) must be, at least, equivalent 
as t — )■ oo. This last remark implies that = E^_^^, or equivalently /3 = 
/3*/2 + (mo — m)/2/3*(T^, and thus (46) leads to 

P = 2p,. (47) 

Using now the first relation in (44) and (47), we then obtain a — q;*/2 + 
E{X)(mo-m)/2P,al 

The consequences of the previous comments can be presented as follows: 
Discussion (CG): 

i) If p* > 1/2 then the set of parameters ■»? e © satisfying condition (44) is 
always empty, since p = 2p* > 1 is not an admissible solution. 
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ii) If < 1/2 and if, for example, E{X) = and mo = m then -d^^ = 
(2p*, q;*/2, /3*/2). In such a case it would be crucial to build a conveniently 
constrained parametric space (most of the time a plot of the dataset helps 
in building reasonnable constraints on the intercept and slope parameter 
spaces) expecting that it contains but not 

iii) More gcncraly one can expect that when the shape of the sample data 
(see e.g. Fig. 1) suggest that (mg — m)/2/3*o'^ is negligible with respect to 

and which occurs when tjiq is close to m or/and is very large, 
then the solution proposed in ii) is loosely speaking still valid. 

5.2 Explicit formula of H{-,'d) and its derivatives 

In this section all the expressions are valid for all (t?, |/) G O x R, and the 
computation of the various derivative functions (under the integral sign) are 
all allowed according to Lebesgue's Theorem and condition (R). According to 
(8) and (10), we have 




P \J-oo Jr 




P \J-oo Jr 




For simplicity we introduce 



ry r 



F\y) = / f{z + {9-9*)Qx)h{x)dxdz, 



J -oo Jr 
ry f 




which leads to 



d_ 
dp 



H{y,i9) 



p* 



P- 



o [{F\y) + F\-y)) - {F^{y) + Foe{-y))] . 
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Let us denote 

Kiy) 

and for 



we obtain 
d 



da 



d_ 
da 
d_ 
da 



F\y) 
F'o{y) 



d_ 
dp 
d_ 
d^ 



Fe{y) -- 
Foe{y) 



f{z + {9 - 9^) x)h{x)dxdz, 
fo{z + 9 Q x)h{x)dxdz, 



xf{z + {9- 9*) x)h{x)dxdz, 
xfo{z + 6* x)h{x)dxdz, 



J {F-{y) + F-{-y)) + ^ ( Fo"(y) + F,-{-y) ) . 



d_ 

w 



H{y,^) = ^ [P^iy) + Pf^i-y)) + ^ (^^(y) + P,^{-y)) . 



At point — the Hessian matrix of H{-,d) defined in (34) is obtained by 
considering 



H{y,^.)^ 



^^W*(y) + Fo^*(-y)-l) ^ 



V 



V{y) 
V{y)E{x) 



(48) 



Let us denote now 



^'''"(^) = ^^'(^) = r [ xf{z+{9-9,)Qx)h{x)dxdz, 

dpda J_^ Jr 

^o'"(y) = ^Poiy) = r f xUz + 9Qx)h{x)dxdz, 

F'^'^'iy) = S^F\y) = r [ f{z +{9- 9^) x)h{x)dxdz, 



Fo'''{y) 



da^ 

d(5'- 




x'^foi^ + 9 Q x)h{x)dxdz. 
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We then obtain 
92 



dp 
dudp 

dudv 



p 

p3 



[{F\y) + F\-y))-{F^{y) + Foe{-y))], 



H{y,^) 



p 

p2 



+ F"{-y)) (F^iy) + Fo (-y)) , u = a, P, 



p — p 



5.3 Boundedness 

Boundedness of'^g{-) and If / and /o are supposed to be bounded over 

M then we clearly have from (8) that 

|*e(z/)|<||/||oo + ||/o||oo, {9,y)e^xR. 

The same kind of argument holds to prove boundedness of f0{-) when (R) ii) 
is supposed. 



Boundedness of H{-,'d). Since for all ^ e $ the functions -F6t(-) and Je(-) are 
both cdfs, we thus have, since 5 < p < 1 — S, from expression (12): 

H{y,^)<^ + 1, (y,i?)eMx$. (49) 
d 

5.4 Integrable Lipschitz property of'^e^-) 

From (8), for all (y, 6, 6^') G M x $^ we have 

\^e{y)- f^e'm < p. [ \f{y + {9-9,)Qx)-f{y+{9' -9,)Qx)\h{x)dx 

JR 

+(1-P*) / \fo{y + 9Qx)-fo{y + 9'Qx)\h{x)dxi50) 
Jr 

Consider for simphcity the first integral term on the right hand side of (50) 
(the same argument holding for the second term). According to the Mean 
Value Theorem there exists, for all (x,y) e and {9,9') e a value 7 := 
^{x,y,9,9') belonging to the hne segment with extremities y -\- {9 — 9*) Q x 
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and y -\- {6' — 6^) x, or equivalently a. 9 9{x, y, 9, 9') belonging to the line 
segment with extremities 9 and 9' such that ^ — y -\- {9 — 9^) Q x and 

1/(2/ +{9- 9^) Qx)- fiv + {9' - 9,) x)\ 
= |/(7)(a-«' + (/3-/3')x)| 
= \f{y + {9- 9.) Qx){a-a' + {/3- /3')x) \ 
< sup + {9- 9,) x)\{\a - a'\ + |^ - ^'\\x\). 

Prom condition (R) ii) there thus exists a nonnegative constant c such that 

/ \^0{y)-^0>{y)\ 

/ |a;P'(sup \f{z + {9 - 9^) Q x) \ + sup \fo{z + 9Q x)\)h{x)dxdz 

<C||^-^'||2. 

5.5 Uniform Lipschitz property of H{-, 'd) 

Let us write 

H{y,^)-H{y,^') 
= - {Feiy) - Fe\y) + Fe^-y) - Fe^-y)) + + 

+^ My) - Je'{y) + J,(-?/) - Je'{-y)) + ^-^^Mv) + 

To prove the uniform Lipschitz property of H {■,'&) we need to prove it for 
Je(-) and Fe{-). We begin with the simplest term Je{-)- According again to 
the mean value theorem, for all y e R, all {x, z) e with z < y, and all 
{9,9') e there exists 9 :— 9{x,z,9,9') belonging to the line segment with 
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extremities 9 and 9' such that 



\My)-Je'{y)\ < r [ \fo{z + 9Qx)-fo{z + {9'(Dx)\h{x)dxdz 

\fo{z + 9 Q x){\a — a'\ + 16* — 9'\\x\)h{x)dxdz, 
< I I snp \fo{z + 9 Q x)h{x)dxdz\a — a'\ 



-oo JK 

+ / \x\ sup \fo{z + ux)h{x)dxdz\l3 — /3'\ 
J-oo Jr fe* 



< c\\9-9'\\2, 

where c denotes a nonnegative constant arising from condition (R) ii). Using 
the same kind of argument we prove that there exists a nonnegative contant 
c' such that for all (y, 9, 9') eRx 

\Fe{y)-Fe>{y)\<c'\\9-9'\\2. 

In conclusion, for all ?/ G M, there exists a nonnegative constant c" such that 
for all {y, i?, '&') eRx 

\H{y, ^) - H{y, ^')\ < ^(c + c')\\9 - 9'h + 4 < c"\\^ - 

5.6 Uniform almost sure rate of convergence of dn 
Let us consider 



\dn{l^)-d{^)\<T,,r.{l^)+T2,n{^), 



where 



- J2 H\V,- Fn,e, Jn,e) - H\V,;d, Fg, Jg) 

i=l 

1 " 

- ^ H\Vf, ^, Fg, Jg) - E {H\Vu ^, Fg, Jg)) 



i=l 



Uniform almost sure rate of convergence of Ti n- Note first that from bound- 
edness of F^fi-, J n,e) and H{-]i), Fg, Jg) given by (49), there exist non- 
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negative constants C and C such that 



- Y^i^iVi, 19, Fn,e, Jn,e) + H{Vi, i&, Fe, Je)) 

i=l 

X {H{Vf, ^, 4,) - H{Vf, ^, Fg, Je)) 
< Csup \H{y;i&, Jn,e) - H{y; i9, Fg, Jg))\ 



< C ( sup I Jn,e (y) - Je {y) | + sup | Fn,0 {y) - Fg {y) \ 



Let us now denote 

^i,n — sup sup I Jn,eiy) " My) I > 

6»e<i> j/eM 

— sup sup I Fn,0 (y) - Fg (y) \ . 
ee* 2/eM 

Convergence rate ofT^^jl. For simpUcity we will suppose that proj2($) C [0, A], 
where A is a nonnegative real number and for all {x, y) £ M^, projg : {x, y) i— )■ y. 
Let us introduce = X^"=i ^Xi the empirical measure associated to the 
iid sample (Xi, .... X„) with common probability distribution with pdf and 
cdf resp. denoted by h and H). We use the functionnal notation Pf — J fdP. 
Notice now that, according to expression (17), we have for all y e M:, 

1 " 

JnAy)-My) = -J2Fo{y + a + /3Xi)-E{Fo{y + a + /3X)) (51) 

i=l 

= {P^-P^)Foiy + a + l3-). 
Let consider the class of functions 



J^o^{x^ Fo{u + I3x); ueR, /3 e[0,A]}. 



Since 



(P„^-P^)Fo(«+/3 •) = (P^^-P^)Fo(y+/3(-VO)) + (P„^-P^)Fo(y+/3(-AO)), 

it is enough to study the empirical process indexed by the classes of functions 

jr+ ^ {x ^ Fo{u + p{x V 0)); ueW, pe[0,A\}, 
= {x^ Fo{u + p{xAO)y, ueW, ^e[0,^]}. 
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For simplicity wc denote Fy^aix) — Fo{y + a{x V 0)) and only consider the class 
J^q", the class being treated in a entirely same way. Since Fq is a cdf, for 
f^i < P < and Ui < u < U2 we have 

^uiM^) < ^uA^) ^ r„2,/32(a;), a; e M, 

and, since Fq is supposed to be Lipschitz, 

< r„„^,(x) - Tu^^.ix) < C{U2 - Ml + {P2 - Pi){x V 0)). 

Let consider now £ > 0, and (uj, u^) G such that 

Fo{u;) > 1 - £, and Fo(ii£) < £. 

Note that and do not depend on (3. For all G N, define 

and consider N{e) the smallest integer such that Ui^^ — Ui-i^e ^ ^ for i = 
2, . . . , iV(£). We denote by [•] the integer part function. For all e small enough 
we clearly have 

N(e) < 

£ 

Let us now define ctj^g = s{i — 1), i = 1, . . . , M{e), where M{e) = \\A + l\/e~\ 
and thus aM{s),s > A- Observe in addition that 

||r„,+i,„/3,+i, - ^u,,,,0j\lpx = c'E {{ui+i,, - Ui,, + iPj+i,, - Pj,,)iXi A 0))^) 

= 2cV (1 + . 

Hence the expression 

[r„,+MA-+,,-r„,,,^J, l<^<iV(£), l<j<M{e), 

is a (^cy^2{l + )) ^-covering of J^q*" in the L2(P^)-norm sense. Using the 
standard notation iV[](-) (see van der Vaart and Wellner [25]) the covering 
number of the class Tq" is bounded as follows 

Ti — II 

TVo (£, J-+,L2(P^)) < cNie)Mie) < c'^^. 
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— Up 



Ue — Up 
< 2— = 



Thus if there exist constants C and V such that 



\u;\A\u^\<C/e\ (52) 

we get N(e)M(s) < C/£^+^ which allows us to use Theorem 2.14.9, p. 246 in 
[25] since their Condition (2.14.7), p. 245 is then satisfied after replacing their 
constant V hy V + 2. Let us discuss condition (52). For e small enough this 
condition is true if < C/e^ and Ue> —C/e^. Denoting by the quantile 
function of Fq, condition (52) becomes 

F^{1 -e)< C/e^ and F^{e) > -C/e^. (53) 

Wc consider for simplicity the first condition in (53) (the second one being 
treated in the same way); it is equivalent to FQ^C/e^) > 1 — e, and taking 
t = C/s^ this condition turns into 

Fo{t) > 1 - C/t^/^. (54) 

Thus it suffices to have 

lnninf-'°f 7,^°M>>0. (55) 
t^oo log(i) 

Finally, using the symmetry of /o, condition (52) holds if 

^■^■^^ -21ogFo(-t) ^ (56) 

t^oo log(t) ^ ^ 

which is insured by condition (R) vi). In conclusion if (56) is satisfied and 
E{Xf) < oo then, according to Theorem 2.14.16, p. 248 in van der Vaart and 
Wellner [25], we obtain 



sup II 4,, - JeWoo < \\Pn - P ho = Oa.s{n-^^ ^''), 7 > 0- 



(2) 

Convergence rate of T{^^. Recall that Fq is the cdf oiYi — 6 Q i.e. 

Fe{y)^P{Yi-0QXi<y), 



and 



^ ^ Yi-6®Xi<y 

I t 

i=l 
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Let K a kernel satifying (K). The X-regularized versions of Fg and Fn^e are 

Fe = K*Fe, F^^e = K * F^^e- 
Let us denote by P^'^ the empirical measure 



n 

1=1 



and by P^'^ the law of (Xi, Yi). 

The set of functions for which {x ,y) ^ ax + by + c being a 3-dimensionnal 
vector space, Corollary 2.5 in Kuelbs and Dudley [19] shows that the class of 

sets 

C = e : au + hv + c<Q]- (a,6,c)eM^}, 

is a Strassen log-log class, which implies that a.s. 

hm sup sup J ^.J',.. {P^ - P'){C) = sup ^P^{C){1-P^{C)) < 1/2. 
n^oo c V 2 log log (n) cec 

Since C contains the class 

5 := {{(li, -y) e R2 : v - {a + pu) < y} ; {a, ^; y) G $ x R} , 

it follows that, for all set SeS, P^{S) = dP^'^{u, v) = P(Y - (a + pX) < 
y) — FQ{y) and for the same reason P^iS) — Fnfi{y), we have 



limsup sup ./- — ^ A Fnfi - Fe){y) < 1/2 a.s. (57) 

n^oo {e,2/)e*xM V loglog(«) 

Now if we replace Fn^g by its regularized version Fn^g the approximation is 
controlled as follows, 

Fn,e{y) - Pnfiiy) 

= - E{F^,e{y)) + E{Fn,0{y)) - Fe{y) + - F„,,(t/) 

+ (58) 
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recalling that E[Fnfi{y)) — -Pe(y)- The first term on the right hand side of 
(58) satisfies 



K{y-u)d{Fn,e-E{Fn,e)){u) 



= [iFr,,e-EiF„,o))iu)dKiy-u) 



= [{Fr,,0-E{Fr,,e)){y-s)dK{s). 

Jm. 

Thus, if we denote A„,e(2/) := F^fiiy)- E{Fn,e{y)) = Fn^iv) - Fe{y) , we obtain 

- E{F^,e{y)) - WnM - E{F^,e{y))\ 



< 



< sup \I^n,0{y)\\\K\\TV ■ 

(9,j/)G<I>xM 

The last bias-term on the right hand side of (58) can be studied using the i?2n 
bound in [22], p. 766, equation (e), which establishes that for each ^ e $ 



sup|E(F„,,)-F,|(|/) < 



(59) 



If K is replaced by Kn{-) = K{-/bn) and we let k2,n '■= / x^dKn{x) = 6^/c2, 
then (57-59) lead to 



lim sup 



n 



sup {Fn,e - Ee){y) < oo a.s., 



n— >-oo 

V log log ( 

whenever limsup(n/ loglog(n))-'^/^A;2,„ < oo which holds when 

lim sup ^ / ^ — /" ^ < oo. 



(60) 



(61) 



log log(n) 

and supgg$ ll/elloo < oo which has been proved in Section 5.3 under Condition 
(R) ii). 

Uniform almost sure rate of convergence ofT2^n- Considering for all i > 0, the 
random variable Wi{'d) :— H'^{Vi;'d), where e ©, we see that 



SUpr2,„('i?) = sup 



n 



i=l 



W0) - E{Wi{i^)) 
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where the right hand term is the supremum of an empirical process indexed 
by a class of Lipschitz bounded functions, which is known to be Oa.s.{^~^^'^^'^) 
for all 7 > (see [2] , for details) , which concludes the proof. 
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